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been the handcrafted development of appUcation or 
sublanguage specific grammars or the use of tem- 
plate based generation grammars. In (Buscmann 



broad coverage grammars appear for many 
applications unreasonably large in relation 
to the relative simplicity of the task at 
hand. On the other hand, handcrafted de- 
velopment of application-dependent gram- 
mars is in danger of duplicating work which 
is then difficult to re-use in other contexts 
of application. To overcome this problem, 
we present in this paper a procedure for the 
automatic extraction of application-tuned 
consistent subgrammars from proved large- 
scale generation grammars. The proce- 
dure has been implemented for large-scale 
systemic grammars and builds on the for- 
mal equivalence between systemic gram- 
mars and typed unification based gram- 
mars. Its evaluation for the generation of 
encyclopedia entries is described, and di- 
rections of future development, applicabil- 
ity, and extensions are discussed. |^ 

1 Introduction 

Although we have reached a situation in computa- 
tional linguistics where large coverage grammars are 
well developed and available in several formal tra- 
ditions, the use of these research results in actual 
applications and for application to specific domains 
is still unsatisfactory. One reason for this is that 
large-scale grammar specifications incur a seemingly 
unnecessarily large burden of space and processing 
time that often does not stand in relation to the 
simplicity of the particular task. The usual alterna- 
tives for natural language generation to date have 



1996) both approaches are combined resulting in a 
practical small generation grammar tool. But still 
the grammars are handwritten or, if extracted from 
large grammars, must be adapted by hand. In gen- 
eral, both - the template and the handwritten ap- 
plication grammar approach - compromise the idea 
of a general nlp system architecture with reusable 
bodies of general linguistic resources. 

We argue that this customization bottleneck 
can be overcome by the automatic extraction of 
application-tuned consistent generation subgram- 
mars from proved given large-scale grammars. In 
this paper we present such an automatic subgram- 
mar extraction tool. The underlying procedure is 
valid for grammars written in typed unification for- 
malisms; it is here carried out for systemic grammars 
within the development environment for text gener- 
ation KPML (Bateman, 1997). The input is a set of 
semantic specifications covering the intended appli- 
cation. This can either be provided by generating a 
predefined test suite or be automatically produced 
by running the particular application during a train- 
ing phase. 

The paper is structured as follows. First, an al- 
gorithm for automatic subgrammar extraction for 
arbitrary systemic grammars will be given, and sec- 
ond the application of the algorithm for generation 
in the domain of 'encyclopedia entries' will be illus- 
trated. To conclude, we discuss several issues raised 
by the work described, including its relevance for 
typed unification based grammar descriptions and 
the possibilities for further improvements in genera- 
tion time. 

2 Grammar extraction algorithm 



^This work was partially supported by the DAAD 
through grant D/96/17139. 



Systemic Functional Grammar (SFG) ( Halliday, 



1985) is based on the assumption that the diflFer- 



entiation of syntactic phenomena is always deter- 
mined by its function in the communicative context. 
This functional orientation has lead to the creation 
of detailed linguistic resources that are character- 
ized by an integrated treatment of content-related, 
textual and pragmatic aspects. Computational in- 
stances of systemic grammar are successfully em- 
ployed in some of the largest and most influen- 
tial text generation projects — such as, for example, 



PENMAN (iMann, 1983| ), COM MUNAL (|Fawcett 
and Tucker, 1990D, TECHDOC (p"osner and Stede 



1994 ) , Drafter (Paris and Vandcr Linden, 1996 ), and 



Gist ( |Not and Stock, 1994|) . 

For our present purposes, however, it is the for- 
mal characteristics of systemic grammar and its im- 
plementations that are more important. Systemic 
grammar assumes multifunctional constituent struc- 
tures representable as feature structures with coref- 
erences. As shown in the following function struc- 
ture example for the sentence "The people that buy 
silver love it.", different functions can be filled by 
one and the same constituent: 



clause 

Senser: 



[T] nominal- group 

Deictic: rfef [ Spelling: "the" ] 
Thing: noun [ Spelling: "people' 
Qualifier: dependent-clause 
Spelling: 

"that buy silver" 

Process: fimte \ Spelling: "love" ] 
Phenomenon: l2| nominal-group 
" Thing: 



pronoun 
\ Spelling: 



"it" 



Subject: |T| 

Theme: |T| 
Directcomplement: 

Given the notational equivalence of HPSG and 



tice than a type hierarchy in the HPSG tradition. In 
systemic grammar, these basic type axioms, the sys- 
tems, are named; we will use entry(s) to denote the 
left-hand side of some named system s, and out{s) to 
denote the set of subtypes {typci, iype2, tj/pe„}- 
the output of the system. The following type ax- 
ioms taken from the large systemic English grammar 
NIGEL ( Matthiessen, 1983) shall illustrate the nature 
of systems in a systemic grammar: 

nominal_group = class_naine I individual_naiiie . 
nominal_group = wh_nominal I nonwh_nominal . 
(OR class_nanie wh_nominal) = singular I plural. 

The meaning of these type axioms is fairly obvi- 
ous: Nominal groups can be subcategorizcd in class- 
names and individual-names on the one hand, they 
can be subcategorizcd with respect to their WH- 
containment into WH-containing nominal-groups 
and nominal-groups without WH-element on the 
other hand. The singular/plural opposition is valid 
for class-names as well as for WH-containing nomi- 
nal groups (be they class or individual names), but 
not for individual-names without WH-element. 

Systemic types inherit constraints with respect to 
appropriate features, their filler types, coreferences 
and order. Here are the constraints for some of the 
types defined above: 

nominal-group [Thing: noun] 
class-name [Thing: common-noun, 

Deictic: top] 
individual-name [Thing: proper-noun] 
wh-nominal [Wh: top] 

Universal principles and rules are in systemic 
grammar not factored out. The lexicon contains 
stem forms and has a detailed word class type hi- 
erarchy at its top. Morphology is also organized as 
a monotonic type hierarchy. Currently used imple- 



mentations of SFG are the PENMAN system (Pen 



vstemic grammar first mentioned bv (]Ca,rpentor. | man Project, 1989| ), t he KPML system ( [Batcman 



199^'! and fIZaiac. 1992h. a,nd further elaborated in 1997| ) and WAG-KRL QO'DonneU, 199^ 



( iHenschel, 1995| ), one can characterize a systemic 
grammar as a large type hierarchy with multiple 
(conjunctive and disjunctive) and multi-dimensional 
inheritance with an open- world semantics. The 
basic element of a systemic grammar — a so-called 
system — is a type axiom of the form (adopting the 
notation of CUF ( Dorre et al., 199(: )): 

entry = type_l I type_2 I ... I type_n. 

where typei to typen are exhaustive and disjoint sub- 
types of type entry, entry need not necessarily be a 
single type; it can be a logical expression over types 
formed with the connectors and and OR. A sys- 
temic grammar therefore resembles more a type lat- 



Our subgrammar extraction has been applied and 
tested in the context of the KPML environment. 
KPML adopts the processing strategy of the PEN- 
MAN system and so it is necessary to briefiy de- 
scribe this strategy. PENMAN performs a semantic 
driven top-down traversal through the grammatical 
type hierarchy for every constituent. Passed types 
are collected and their feature constraints are unified 
to build a resulting feature structure. Substructure 
generation requires an additional grammar traversal 
controlled by the feature values given in the super- 
structure. In addition to the grammar in its orig- 
inal sense, the PENMAN system provides a par- 
ticular interface between grammar and semantics. 



This interface is organized with the help of so-called 
choosers — these are decision trees associated with 
each system of the grammar which control the se- 
lection of an appropriate subtype during traversal. 
Choosers should be seen as a practical means of en- 
abling applications (including text planners) to in- 
teract with the grammar using purely semantic spec- 
ifications even though a fully specified semantic the- 
ory may not yet be available for certain important 
areas necessary for coherent, fluent text generation. 
They also serve to enforce deterministic choice — an 



important property for practical generation (cf. ( Re- 
iter, 1994|) ). 



The basic form of a chooser node is as follows, 
(ask query 

{answer 1 actions) 
{answer2 actions) 

•••) 

The nodes in a chooser are queries to the seman- 
tics, the branches contain a set of actions including 
embedded queries. Possible chooser actions are the 
following: 

(ask query (..) ... (..)) 
(choose type) 
(identify Junction concept) 
(copyhub functionl function2) 

A choose action of a chooser explicitly (choose type) 
selects one of the output types of its associated sys- 
tem. In general, there can be several paths through 
a given chooser that lead to the selection of a sin- 
gle grammatical type: each such path corresponds 
to a particular configuration of semantic properties 
sufficient to motivate the grammatical type selected. 
Besides this, choosers serve to create a binding be- 
tween given semantic objects and grammatical con- 
stituents to be generated. This is performed by the 
action (identify function concept). Because of the 
multifunctionality assumed for the constituent struc- 
ture in systemic grammar, two grammatical func- 
tions can be realized by one and the same constituent 
with one and the same underlying semantics. The 
action (copyhub functionl function2) is responsible 
for identifying the semantics of both grammatical 
functions. 

Within such a framework, the first stage of sub- 
grammar extraction is to ascertain a representative 
set of grammatical types covering the texts for the 
intended application. This can be obtained by run- 
ning the text generation system within the appli- 
cation with the full unconstrained grammar. All 
grammatical types used during this training stage 



are collected to form the backbone for the subgram- 
mar to be extracted. We call this cumulative type 
set the goal-types. 

The list of goal-types then gives the point of depar- 
ture for the second stage, the automatic extraction of 
a consistent subgrammar. goal-types is used as a fil- 
ter against which systems (type axioms) are tested. 
Types not in goal-types have to be excised from the 
subgrammar being extracted. This is carried out 
for the entries of the systems in a preparatory step. 
We assume that the entries are given in disjunctive 
normal form. First, every conjunction containing 
a type which is not in goal-types is removed. Af- 
ter this deletion of unsatisfiable conjunctions, ev- 
ery type in an entry which is not in goal-types is 
removed. The restriction of the outputs of every 
system to the goal-types is done during a simulated 
depth-first traversal through the entire grammati- 
cal type lattice. The procedure works on the type 
lattice with the revised entries. Starting with the 
most general type start (and the most general sys- 
tem called rank which is the system with start as en- 
try) , a hierarchy traversal looks for systems which al- 
though restricted to the type set goal-types actually 
branch, i.e. have more than one type in their out- 
put. These systems constitute the new subgrammar. 
In essence, each grammatical system s is examined 
to see how many of its possible subtypes in out{s) 
are used within the target grammar. Those types 
which are not used are excised from the subgram- 
mar being extracted. More specific types that are 
dependent on any excised types are not considered 
further during the traversal. Grammatical systems 
where there is only a single remaining unexcised sub- 
type collapse to form a degenerated pseudo-system 
indicating that no grammatical variation is possible 
in the considered application domain. For example, 
in the application described in section 3 the system 

indicative — declarative | interrogative, 
collapses into 

indicative — declarative, 
because questions do not occur in the application 
domain. Pseudo-systems of this kind are not kept in 
the subgrammar. The types on their right-hand side 
(pseudotypes) are excised accordingly, although they 
are used for deeper traversal, thus defining a path 
to more specific systems. Such a path can consist of 
more than one pseudotype, if the repeated traver- 
sal steps find further degenerated systems. Con- 
straints defined for pseudo-types are raised, chooser 
actions are percolated down — i.e., more precisely, 
constraints belonging to a pseudo-type are unified 
with the constraints of the most general not pseudo 
type at the beginning of the path. Chooser actions 



extract-subgrammar (goa^types) 

1 for all s G systems 

do entry{s) := remove-unsatisfiable-features(entry(s)) 

2 * sub grammar* := 

3 traverse-system(ranfc, start, start, 0, goaltypes) 

traverse-systeni(s, type, supertype, inheritedconstraints , goaltypes) 

1 inter :— out{s) H goaltypes 

2 if inter ^ 

then if \entry{s)\ = 1 and |mier| — 1 

then do out :— the single element in inter 

constraints := miiiy{constraints{out), inheritedconstraints) 
traverse-type(oui, supertype, constraints, goaltypes) 
else do entry(s) := dni-suhstitute{supertype,type, entry(s)) 
out(s) := inter 
push(s, *subgrammar*) 
for all out e inter 
do traverse-type(out, out, 0, goaltypes) 
constraints{supertype) :— 

VLiiiiy (constraints{supertype),inheritedrealizations) 

traverse- type (type, supertype, inheritedconstraints, goaltypes) 

1 who := who-has-in-entry(type) 

2 if who = and inheritedconstraints ^ 

then do constraints{supertype) :— 

xnAbf{constraints{supertype), inheritedconstraints) 

3 for all s G w/io 

do traverse-systeni(s, type, supertype, inheritedconstraints, goaltypes) 



Figure 1: Subgrammar extraction algorithm 



from systems on the path are collected and extend 
the chooser associated with the final (and first not 
pseudo) system of the path. However, in the case 
that a maximal type is reached which is not in goal- 
types, chooser actions have to be raised too. The 
number of goal-types is then usually larger than the 
number of the types in the extracted subgrammar 
because all pseudotypes in goal-types are excised. 

As the recursion criteria in the traversal, we first 
simply look for a system which has the actual type 
in its revised entry regardless of the fact if it occurs 
in a conjunction or not. This on its own, however, 
oversimplifies the real logical relations between the 
types and would create an inconsistent subgrammar. 
The problem is the conjunctive inheritance. If the 
current type occurs in an entry of another system 
where it is conjunctively bound, a deeper traversal 
is in fact only licensed if the other types of the con- 
junctions are chosen as well. In order to perform 
such a traversal, a breadth traversal with compila- 



tentially computationally very expensive operation, 
but not to give up the consistency of the subgram- 
mar, the implemented subgrammar extraction pro- 
cedure sketched in Figure ^ maintains all systems 
with complex entries (be they conjunctive or disjunc- 
tive) for the subgrammar even if they do not really 
branch and collapse to a single-su btype systemp] A 
related approach can be found in (O'Donnell, 1992) 
for the extraction of smaller systemic subgrammars 
for analysis. 

If the lexicon is organized as or under a com- 
plex type hierarchy, the extraction of an application- 
tuned lexicon is carried out similarly. This has the 
effect that closed class words are removed from the 
lexicon if they are not covered in the application do- 
main. Open class words belonging to word classes 
not covered by the subgrammar type set are re- 
moved. Some applications do not need their own lex- 
icon for open class words because they can be linked 



tion of all crowns of the lattice (see ( Ai't-Kaci et al. 



1989)) would be necessary. In order to avoid this po- 



Keeping the disjunctive systems is not necessary for 
the consistency, but saves multiple raising of one and the 
same constraint. 



to an externally provided domain-specific thesaurus 
(as is the case for the examples discussed below). In 
this sublexicon extraction is not necessary. 

3 Application for text type 'lexicon 
biographies' 

The first trial application of the automatic subgram- 
mar extraction tool has been carried out for an in- 
formation system with an output component that 
generates integrated text and graphics. This in- 
formation system has been developed for the do- 
main of art history and is capable of providing short 
biography articles for around 10 000 artists. The 
underlying knowledge base, comprising half a mil- 
lion semantic concepts, includes automatically ex- 
tracted information from 14 000 encyclopedia ar- 
ticles from McMillans planned publication "Dictio- 
nary of Art" combined with several additional infor- 
mation sources such as the Getty "Art and Archi- 
tecture T hesaurus"; the applic ation is described in 
detail in (Kamps et al., 1996). As input the user 
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clicks on an artist name. The system then performs 
content selection, text planning, text and diagram 
generation and page layout automatically. Possible 
output languages are English and German. 

The grammar necessary for short biographical 
articles is, however, naturally much more con- 
strained than that supported by general broad- 
coverage grammars. There are two main reasons 
for this: first, because of the relatively fixed text 
type "encyclopedia biography" involved, and sec- 
ond, particularly in the example information system, 
because of the relatively simple nature of the knowl- 
edge base — this does not support more sophisticated 
text generation as might appear in full encyclopedia 
articles. Without extensive empirical analysis, one 
can already state that such a grammar is restricted 
to main clauses, only coordinative complex clauses, 
and temporal and spatial prepositional phrases. It 
would probably be possible to produce the generated 
texts with relatively complex templates and aggre- 
gation heuristics: but the full grammars for English 
and German available in KPML already covered the 
required linguistic phenomena. 

The application of the automatic subgrammar ex- 
traction tool to this scenario is as follows. 

In the training phase, the information system runs 
with the full generation grammar. All grammatical 
types used during this stage are collected to yield 
the cumulative type set goal-types. How many text 
examples must be generated in this phase depends 
on the relative increase of new information (occur- 
rence of new types) obtained with every additional 
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Roger Hilton was an English painter. He was 
born at Northwood on 23 March 1911, and he 
died at Botallack on 23 February 1975. He 
studied at Slade School in 1929 - 1931. He cre- 
ated "February - March 1954", "Grey figure", 
"Oi yoi yoi" and "June 1953 (deep cadmium)". 

Anni Albers is American, and she is a tex- 
tile designer, a draughtsman and a print- 
maker. She was born in Berlin on 12 June 
1899. She studied art in 1916 - 1919 with 
Brandenburg. Also, she studied art at the 
Kunstgewerbeschule in Hamburg in 1919 - 
1920 and the Bauhaus at Weimar and Dessau 
in 1922 - 1925 and 1925 - 1929. In 1933 she 
settled in the USA. In 1933 - 1949 she taught 
at Black Mountain College in North Carolina. 

Figure 2: Cumulative type use with sentences from 
the short biography text type 



sentence generated. We show here the results for 
two related text types: 'short artist biographies' and 
'artist biography notes'. 

Figure || shows the growth curve for the type set 
(vertical axis) with each additional semantic specifi- 
cation passed from the text planner to the sentence 
generator (horizontal axis) for the first of these text 
types. The graph shows the cumulative type usage 
for the first 90 biographies generated, involving some 
230 sentences.^ The subgrammar extraction for the 
"short artist biographies" text type can therefore be 
performed with respect to the 246 types that are 
required by the generated texts, applying the algo- 
rithm described above. The resulting extracted sub- 



■^This represented the current extent of the knowledge 
base when the test was performed. It is therefore possible 
that with more texts, the size of the cumulative set would 
increase slightly since the curve has not quite 'flattened 
out'. Explicit procedures for handling this situation are 
described below. 
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Nathan Drake was an English painter. He was 
born at Lincoln in 1728, and he died at York 
on 19 February 1778. 



Figure 3: Cumulative type use with sentences from 
the note biography text type 



grammar is a type lattice with only 144 types. The 
size of the extracted subgrammar is only 11% of that 
of the original grammar. Run times for sentence gen- 
eration with this extracted grammar typically range 
from 55%-75% of that of the full grammar (see Ta- 
ble 0) — in most cases, therefore, less than one sec- 
ond with the regular KPML generation environment 
(i.e., unoptimized with full debugging facilities resi- 
dent). 

The generation times are indicative of the style 
of generation implemented by KPML. Clause types 
with more subtypes are likely to cause longer pro- 
cessing times than those with fewer subtypes. When 
there are in any case fewer subtypes available in 
the full grammar (as in the existential shown in Ta- 
ble 1^) , then there will be a less noticeable improve- 
ment compared with the extracted grammar. In ad- 
dition, the run times reflect the fact that the number 
of queries being asked by choosers has not yet been 
maximally reduced in the current evaluation. Noting 
the cumulative set of inquiry responses during the 
training phase would provide sufhcient information 
for more effective pruning of the extracted choosers. 

The second example shows similar improvements. 
The very short biography entry is appropriate more 
for figure headings, margin notes, etc. The cumu- 
lative type use graph is shown in Figure ^. With 
this 'smaller' text type, the cumulative use stabilizes 
very quickly (i.e., after 39 sentences) at 205 types. 
This remained stable for a test set of 500 sentences. 
Extracting the corresponding subgrammar yields a 
grammar involving only 101 types, which is 7% of 
the original grammar. Sentence generation time is 
accordingly faster, ranging from 40%-60% of that of 



the full grammar. In both cases, it is clear that the 
size of the resulting subgrammar is dramatically re- 
duced. The generation run-time is cut to 2/3. The 
run-time space requirements are cut similarly. The 
processing time for subgrammar extraction is less 
than one minute, and is therefore not a significant 
issue for improvement. 

4 Conclusions and discussion 

In this paper, we have described how generation re- 
sources for restricted applications can be developed 
drawing on large-scale general generation grammars. 
This enables both re-use of those resources and pro- 
gressive growth as new applications are met. The 
grammar extraction tool then makes it a simple task 
to extract from the large-scale resources specially 
tuned subgrammars for particular applications. Our 
approach shows some similarities to that proposed 



by (Rayner and Carter, 1996) for improving parsing 



performance by grammar pruning and specialization 
with respect to a training corpus. Rule components 
are 'chunked' and pruned when they are unlikely to 
contribute to a successful parse. Here we have shown 
how improvements in generation performance can be 
achieved for generation grammars by removing parts 
of the grammar specification that are not used in 
some particular sublanguage. The extracted gram- 
mar is generally known to cover the target sublan- 
guage and so there is no loss of required coverage. 

Another motivation for this work is the need for 
smaller, but not toy-sized, systemic grammars for 
their experimental compilation into state-of-the-art 
feature logics. The ready access to consistent sub- 
grammars of arbitrary size given with the automatic 
subgrammar extraction reported here allows us to 
investigate further the size to which feature logic 
representations of systemic grammar can grow while 
remaining practically usable. The compilation of the 
full grammar NIGEL has so far only proved possible 



for CUF (see (Henschel, 1995)), and the resulting 



type deduction runs too slowly for practical applica- 
tions. 

It is likely that further improvements in gener- 
ation performance will be achieved when both the 
grammatical structures and the extracted choosers 
are pruned. The current results have focused pri- 
marily on the improvements brought by reconfig- 
uring the type lattice that defines the grammar. 
The structures generated are still the 'full' gram- 
matical structures that are produced by the cor- 
responding full grammar: if, however, certain con- 
stituent descriptions are always unified (confiated in 
systemic terminology) then, analogously to ( Rayner 
and Carter, 1996), they are candidates for replace- 





run time (in ms) 








full grammar 


subgrammar 


improvement 


sentence 


worst case 


380 


300 


80 


"There is Paul Delaroche." 


best case 


3250 


1830 


1430 


"John Foster was born in Liverpool 
on 1 January c 1787, and he died at 
Birlcenhead on 21 August 1846." 


average case 


ca. 900 


ca. 590 


310 


e.g., "Mary Mosor was an English 
painter." "George Richmond stud- 
ied at Royal Academy in 1824." 



(Under Allegro Common Lisp running on a SparclO.) 
Table 1: Example run times for "short artist biographies" 



ment by a single constituent description in the 
extracted subgrammar. Moreover, the extracted 
choosers can also be pruned directly with respect 
to the sublanguage. Currently the pruning carried 
out is only that entailed by the type lattice, It is 
also possible however to maintain a record of the 
classificatory inquiry responses that arc used in a 
subgrammar: responses that do not occur can then 
motivate further reductions in the choosers that are 
kept in the extracted grammar. Evaluation of the 
improvements in performance that these strategies 
bring are in progress. 

One possible benefit of not pruning the chooser de- 
cision trees completely is to provide a fall-back posi- 
tion for when the input to the generation component 
in fact strays outside of that expected by the target- 
ted subgrammar. Paths in the chooser decision tree 
that do not correspond to types in the subgrammar 
can be maintained and marked explicitly as 'out of 
bounds' for that subgrammar. This provides a se- 
mantic check that the semantic inputs to the genera- 
tor remain within the limits inherent in the extracted 
subgrammar. If it sufficiently clear that these lim- 
its will be adhered to, then further extraction will 
be free of problems. However if the demands of an 
application change over time, then it is also possible 
to use the semantic checks to trigger regeneration 
with the full grammar: this offers improved average 
throughput while maintaining complete generation. 
Noting exceptions can also be used to trigger new 
subgrammar extractions to adapt to the new appli- 
cations demands. A number of strategies therefore 
present themselves for incorporating grammar ex- 
traction into the application development cycle. 

Although we have focused here on run-time im- 
provements, it is clear that the grammar extraction 
tool has other possible uses. For example, the ex- 
istence of small grammars is one important contri- 
bution to providing teaching materials. Also, the 
ability to extract consistent subcomponents should 



make it more straightforward to combine grammar 
fragments as required for particular needs. Further 
validation in both areas forms part of our ongoing re- 
search. Moreover, a significantly larger reduction of 
the type lattice can be expected by starting not from 
the cumulative set of goal-types for the grammar re- 
duction, but from a detailed protocol of jointly used 
types for every generated sentence of the training 
corpus. A clustering technique applied to such a 
protocol is under development. 

Finally, the proposed procedure is not bound to 
systemic grammar and can also be used to extract 
common typed unification subgrammars. Here, 
however, the gain will probably not be as remark- 
able as in systemic grammar. The universal prin- 
ciples of, for example, an HPSG cannot be excised. 
HPSG type hierarchies usually contain mainly gen- 
eral types, so that they will not be affected sub- 
stantially. In the end, the degree of improvement 
achieved depends on the extent to which a grammar 
explicitly includes in its type hierarchy distinctions 
that are fine enough to vary depending on text type. 
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